注意机制在视力识别方面取得了巨大成功。许多作品致力于提高注意力机制的有效性,该机制精心设计了注意操作员的结构。这些作品需要大量实验才能在场景变化时挑选最佳设置,这会消耗大量时间和计算资源。此外,神经网络通常包含许多网络层,并且大多数研究通常使用相同的注意模块来增强不同的网络层,从而阻碍了自我发挥机制的性能的进一步改善。为了解决上述问题,我们提出了一个自我发挥的模块SEM。基于注意模块和替代注意操作员的输入信息,SEM可以自动决定选择和集成注意操作员以计算注意力图。 SEM的有效性通过广泛使用的基准数据集和流行的自我发挥网络的广泛实验来证明。
translated by 谷歌翻译
最近,提出了许多有效的自我发场模块来启动模型性能,通过利用计算机视觉中的卷积神经网络的内部信息。总的来说,许多以前的作品都忽略了考虑自我发挥机制的合并策略的设计,因为它们采用了全球平均水平,这是理所当然的,这阻碍了自我发挥机制的表现进一步改善。但是,我们从经验上发现并验证了一种现象,即全球最大速度和全球最小程度的简单线性组合可以产生匹配或超过全球平均平均水平的性能的合并策略。基于这一经验观察,我们提出了一个简单的自我发场模块SPENET,该模块Spenet采用了基于全球最大功能和全球最小程度的自适应汇总策略,以及用于生成注意力图的轻量级模块。 Spenet的有效性通过广泛使用的基准数据集和流行的自我注意力网络进行了广泛的实验证明。
translated by 谷歌翻译
音频命令是一种首选的沟通媒介,可将检查员保持在半自治无人机进行的民用基础设施检查环境中。为了了解一组异质和动态检查员的特定工作命令,需要为小组成本开发一个模型,并在组更改时很容易适应。本文的动机是建立一个具有股票分布的架构的多任务深度学习模型。该体系结构允许两个分类任务共享功能提取器,然后通过功能投影和协作培训在提取功能中交织在一起的特定主题和关键字特定功能。一组五个授权主题的基本模型对本研究收集的检查关键字数据集进行了培训和测试。该模型在分类任何授权检查员的关键字时达到了95.3%或更高的平均准确性。它在扬声器分类中的平均准确性为99.2%。由于该模型从合并的培训数据中学习的更丰富的关键字表示,因此将基本模型调整为新检查员只需要该检查员的少量培训数据,例如每个关键字五个话语。在验证授权检查员和76.1 \%的检测中,使用说话者分类分数进行检查员验证可以达到至少93.9%的成功率。此外,本文展示了所提出的模型对公共数据集上的大型组的适用性。本文为解决AI辅助人类机器人互动面临的挑战提供了解决方案,包括工人异质性,工人动态和工作异质性。
translated by 谷歌翻译
在本文中,我们考虑了一种用于主成分分析(PCA)的新变体,旨在同时捕获因子负载的分组和/或稀疏结构。为了实现这些目标,我们采用非凸截面的正则化,具有自然可调的稀疏性和分组效应,并提出了特征分组和稀疏主组件分析(FGSPCA)。所提出的FGSPCA方法鼓励具有相似值的因子负载,以将特征分组或特征零值组分成特征选择的差异均匀组,从而有助于降低模型的复杂性和增加模型解释。通常,现有的结构化PCA方法需要先验知识来构建正则化项。但是,提出的FGSPCA可以同时捕获因子负载的分组和/或稀疏结构,而无需任何事先信息。为了解决所得的非凸优化问题,我们提出了一种交替的算法,该算法结合了Convex编程,增强的Lagrange方法和坐标下降方法。实验结果证明了新方法在合成和现实世界数据集上的有希望的性能和效率。可以在github {https://github.com/higeeks/fgspca}上找到FGSPCA的R实现。
translated by 谷歌翻译
Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introducing a demand for AI-based tools that improve the efficiency with which radiologists can comfortably interpret these exams. AI has been shown to improve efficiency in medical-image generation, processing, and interpretation, and a variety of such AI models have been developed across research labs worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. To address the barrier to clinical deployment, we have formed MONAI Consortium, an open-source community which is building standards for AI deployment in healthcare institutions, and developing tools and infrastructure to facilitate their implementation. This report represents several years of weekly discussions and hands-on problem solving experience by groups of industry experts and clinicians in the MONAI Consortium. We identify barriers between AI-model development in research labs and subsequent clinical deployment and propose solutions. Our report provides guidance on processes which take an imaging AI model from development to clinical implementation in a healthcare institution. We discuss various AI integration points in a clinical Radiology workflow. We also present a taxonomy of Radiology AI use-cases. Through this report, we intend to educate the stakeholders in healthcare and AI (AI researchers, radiologists, imaging informaticists, and regulators) about cross-disciplinary challenges and possible solutions.
translated by 谷歌翻译
Smart City applications, such as traffic monitoring and disaster response, often use swarms of intelligent and cooperative drones to efficiently collect sensor data over different areas of interest and time spans. However, when the required sensing becomes spatio-temporally large and varying, a collective arrangement of sensing tasks to a large number of battery-restricted and distributed drones is challenging. To address this problem, we introduce a scalable and energy-aware model for planning and coordination of spatio-temporal sensing. The coordination model is built upon a decentralized multi-agent collective learning algorithm (EPOS) to ensure scalability, resilience, and flexibility that existing approaches lack of. Experimental results illustrate the outstanding performance of the proposed method compared to state-of-the-art methods. Analytical results contribute a deeper understanding of how coordinated mobility of drones influences sensing performance. This novel coordination solution is applied to traffic monitoring using real-world data to demonstrate a $46.45\%$ more accurate and $2.88\%$ more efficient detection of vehicles as the number of drones become a scarce resource.
translated by 谷歌翻译
Unbiased learning to rank (ULTR) studies the problem of mitigating various biases from implicit user feedback data such as clicks, and has been receiving considerable attention recently. A popular ULTR approach for real-world applications uses a two-tower architecture, where click modeling is factorized into a relevance tower with regular input features, and a bias tower with bias-relevant inputs such as the position of a document. A successful factorization will allow the relevance tower to be exempt from biases. In this work, we identify a critical issue that existing ULTR methods ignored - the bias tower can be confounded with the relevance tower via the underlying true relevance. In particular, the positions were determined by the logging policy, i.e., the previous production model, which would possess relevance information. We give both theoretical analysis and empirical results to show the negative effects on relevance tower due to such a correlation. We then propose three methods to mitigate the negative confounding effects by better disentangling relevance and bias. Empirical results on both controlled public datasets and a large-scale industry dataset show the effectiveness of the proposed approaches.
translated by 谷歌翻译
Benefiting from its single-photon sensitivity, single-photon avalanche diode (SPAD) array has been widely applied in various fields such as fluorescence lifetime imaging and quantum computing. However, large-scale high-fidelity single-photon imaging remains a big challenge, due to the complex hardware manufacture craft and heavy noise disturbance of SPAD arrays. In this work, we introduce deep learning into SPAD, enabling super-resolution single-photon imaging over an order of magnitude, with significant enhancement of bit depth and imaging quality. We first studied the complex photon flow model of SPAD electronics to accurately characterize multiple physical noise sources, and collected a real SPAD image dataset (64 $\times$ 32 pixels, 90 scenes, 10 different bit depth, 3 different illumination flux, 2790 images in total) to calibrate noise model parameters. With this real-world physical noise model, we for the first time synthesized a large-scale realistic single-photon image dataset (image pairs of 5 different resolutions with maximum megapixels, 17250 scenes, 10 different bit depth, 3 different illumination flux, 2.6 million images in total) for subsequent network training. To tackle the severe super-resolution challenge of SPAD inputs with low bit depth, low resolution, and heavy noise, we further built a deep transformer network with a content-adaptive self-attention mechanism and gated fusion modules, which can dig global contextual features to remove multi-source noise and extract full-frequency details. We applied the technique on a series of experiments including macroscopic and microscopic imaging, microfluidic inspection, and Fourier ptychography. The experiments validate the technique's state-of-the-art super-resolution SPAD imaging performance, with more than 5 dB superiority on PSNR compared to the existing methods.
translated by 谷歌翻译
Artificial Intelligence (AI) and its applications have sparked extraordinary interest in recent years. This achievement can be ascribed in part to advances in AI subfields including Machine Learning (ML), Computer Vision (CV), and Natural Language Processing (NLP). Deep learning, a sub-field of machine learning that employs artificial neural network concepts, has enabled the most rapid growth in these domains. The integration of vision and language has sparked a lot of attention as a result of this. The tasks have been created in such a way that they properly exemplify the concepts of deep learning. In this review paper, we provide a thorough and an extensive review of the state of the arts approaches, key models design principles and discuss existing datasets, methods, their problem formulation and evaluation measures for VQA and Visual reasoning tasks to understand vision and language representation learning. We also present some potential future paths in this field of research, with the hope that our study may generate new ideas and novel approaches to handle existing difficulties and develop new applications.
translated by 谷歌翻译
One of the key challenges in deploying RL to real-world applications is to adapt to variations of unknown environment contexts, such as changing terrains in robotic tasks and fluctuated bandwidth in congestion control. Existing works on adaptation to unknown environment contexts either assume the contexts are the same for the whole episode or assume the context variables are Markovian. However, in many real-world applications, the environment context usually stays stable for a stochastic period and then changes in an abrupt and unpredictable manner within an episode, resulting in a segment structure, which existing works fail to address. To leverage the segment structure of piecewise stable context in real-world applications, in this paper, we propose a \textit{\textbf{Se}gmented \textbf{C}ontext \textbf{B}elief \textbf{A}ugmented \textbf{D}eep~(SeCBAD)} RL method. Our method can jointly infer the belief distribution over latent context with the posterior over segment length and perform more accurate belief context inference with observed data within the current context segment. The inferred belief context can be leveraged to augment the state, leading to a policy that can adapt to abrupt variations in context. We demonstrate empirically that SeCBAD can infer context segment length accurately and outperform existing methods on a toy grid world environment and Mujuco tasks with piecewise-stable context.
translated by 谷歌翻译